-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix gather()
for STRUCT
inputs with no nulls in members.
#9194
Fix gather()
for STRUCT
inputs with no nulls in members.
#9194
Conversation
Fixes rapidsai#9188. Prior to this fix, when `cudf::gather()` is called on a STRUCT input column, the null masks of the children of the result column would not be set correctly if the child columns do not contain nulls. This fix enforces null mask calculation if NULLIFY is set.
#9188 was obscured inadvertently in the tests. The gather map used in the tests looked like: auto const gather_map = gather_map_t{-1, 4, 3, 2, 1, 7, 3}; The intention was to cause the gathered output to have a null row at index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verified this fixes the failing Spark tests that prompted the filing of #9188. Thanks, @mythrocks!
Credit where it's due: Thank you for the fix, @jlowe. I was convinced the problem had more to do with not calling |
The Python test failures look unrelated to this change:
|
Rerun tests. |
Codecov Report
@@ Coverage Diff @@
## branch-21.10 #9194 +/- ##
===============================================
Coverage ? 10.82%
===============================================
Files ? 115
Lines ? 19166
Branches ? 0
===============================================
Hits ? 2074
Misses ? 17092
Partials ? 0 Continue to review full report at Codecov.
|
Rerun tests |
1 similar comment
Rerun tests |
Rerun tests. |
Rerun tests. |
I've rebased to latest on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
just mild suggestions on style.
@gpucibot merge |
This PR fixes the `gather` API for structs columns when the input is a sliced column. Previously, `gather` calls `child_begin()` and `child_end()` to access the children column so if the input structs column is sliced then the output is incorrect. This closes #9213, and is blocked by #9194 due to conflict work. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - MithunR (https://github.com/mythrocks) - Mark Harris (https://github.com/harrism) URL: #9218
Fixes #9188.
Prior to this fix, when
cudf::gather()
is called on a STRUCT inputcolumn, the null masks of the children of the result column would not
be set correctly if the child columns do not contain nulls.
This fix enforces null mask calculation if
NULLIFY
is set.In addition, this commit also cleans up the
TypedStructGatherTest
test suite:STRUCT
column construction.assert()
conditions.column
construction tocolumn_wrapper
andcolumn_view
.TestGatherStructOfListOfStructs
.